Improving Statistical Machine Translation Performance by Oracle-BLEU Model Re-estimation
نویسندگان
چکیده
We present a novel technique for training translation models for statistical machine translation by aligning source sentences to their oracle-BLEU translations. In contrast to previous approaches which are constrained to phrase training, our method also allows the re-estimation of reordering models along with the translation model. Experiments show an improvement of up to 0.8 BLEU for our approach over a competitive Arabic-English baseline trained directly on the word-aligned bitext using heuristic extraction. As an additional benefit, the phrase table size is reduced dramatically to only 3% of the original size.
منابع مشابه
An Efficient Two-Pass Decoder for SMT Using Word Confidence Estimation
During decoding, the Statistical Machine Translation (SMT) decoder travels over all complete paths on the Search Graph (SG), seeks those with cheapest costs and backtracks to read off the best translations. Although these winners beat the rest in model scores, there is no certain guarantee that they have the highest quality with respect to the human references. This paper exploits Word Confiden...
متن کاملSentence selection for improving the tuning process of a statistical machine translation system
This paper describes a sentence selection strategy for tuning a statistical machine translation system based on Moses that translates Spanish into English. This work proposes two techniques that allow selecting the more similar source sentences of the development corpus to the sentences to translate (source test sentences). With this selection, better model weights are obtained to be used later...
متن کاملTopic and Sentiment in Phrase-based Statistical Machine Translation
In this paper, we model two textual properties, topic and sentiment, at the sentence and document levels, with the goal of improving the performance of machine translation by taking into account this information in source and target sentences. In the topical similarity approach, we augment the source sentence with the keywords extracted from its adjacent sentences and re-rank the candidate targ...
متن کاملImproving the Performance of GIZA++ Using Variational Bayes
Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We apply one such Bayesian technique, variational Bayes, to GIZA++, a widely-used piece of software that computes word alignments for statistical machine translation. We show that using variational Bayes improves the performan...
متن کاملComparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
This paper describes a new method to compare reordering constraints for Statistical Machine Translation. We investigate the best possible (oracle) BLEU score achievable under different reordering constraints. Using dynamic programming, we efficiently find a reordering that approximates the highest attainable BLEU score given a reference and a set of reordering constraints. We present an empiric...
متن کامل